Bayesian Inference from Scratch 

Jakub Mielczarelc 

Astronomical Observatory, Jagiellonian University, Orla 171, 30-244 Krakow, 

Marek Szydlowskjl] 

Mark Kac Complex Systems Research Centre, 
Jagiellonian University, Reymonta 4, 30-059 Krakow, Poland and 
Department of Theoretical Physics, Faculty of Philosophy, 
The John Paul II Catholic University of Lublin, 
Al. Raclawickie 14, 20-950 Lublin, Poland, 

Pawel TamboJ*! 
Department of Theoretical Physics, Faculty of Philosophy, 
The John Paul II Catholic University of Lublin, 
Al. Raclawickie 14, 20-950 Lublin, Poland 

We study epistemological and philosophical aspects of the Bayesian approach in 
different areas of science. The basic intuition as well as pedagogical introduction to 
the Bayesian framework is given for a further discussion concerning Bayesian infer- 
ence in physics. We claim Bayesian inference to be susceptible to some epistemic 
limitations. We also point out paradoxes of confirmation, like Goodman's para- 
dox, appearing in Bayesian Theory of Confirmation in the context of cosmological 
applications. 



I. INTRODUCTION 

In the every day experience, even when we do not realize about it, we use our intuition 
to inference. For example when we hear doorbell we immediately start to ask ourselves 



*Electronic address: jakubm@poczta.onet.pl 
^Electronic address: uoszydlo@cyf-kr.edu.pl 
^-Electronic address: xpt76@poczta.fm 



2 



"who has come?". In the way to the door we consider many different possibilities. Maybe 
it is our friend, who said some days before that he will visit us. Maybe it is our neighbor, 
who came to say that we should lower the music or maybe somebody came to inform us 
that we have won 1 000 000 euro on lottery and so one and so on. All of these possibilities 
are more or less probable, depending on our knowledge and observations. If we for example 
look through the window and see the car on the street, which looks like our friend one, then 
we are almost sure that it is he in fact. Or if we really listen music very loudly we could 
expect that somebody sooner or later starts to be angry. In case when we do not expect 
really anyone we know, we assume that in can be postman or someone missed address. In 
fact we always choose the simplest case. It means that we never assume that supermen 
is standing behind the door, even when there is a reason why he should visit us. This in- 

n 

tuitive feeling to choose the simplest solution is called the Occam razor [1J and states formally 
Accept the simplest explanation that fits the data 

In our example data mean in fact what we already know. The presented example is very 
easy and our intuition manages very well to solve problems of this kind. However there is 
a vast variety of problems for which a human brain is not qualified or needs to much time. 
On the other side there are problems, which are too boring to be solved by humans and we 
would like somehow to put this intuition to the form of the computer program. So can we 
close, this intuitive feeling, in the form of mathematically defined theory and use it instead 
of brain? The answer is Yes, this excited idea is embodied in the form of Bayesian inference 

a. 

Below we introduce Bayesian inference and show how it works in practise. We start from 
general considerations which lead us to the connection with thermodynamic. Subsequently 
we show how to implement problems on the computer with use of the Monte-Carlo ap- 
proach (Metropolis algorithm). We describe possible applications in the modern cosmology. 
In the last section we try to put Bayesian inference into a larger perspective: containing 
epistemological aspects of the method as well as suggested limitations. 



3 



II. B AYES IAN INFERENCE AND THERMODYNAMICS 

Our goal, as we have motivated earlier, is to compare different alternative theories. Of 
course these theories can have different basis. They can be connected with every day ex- 
perience, data analysis, biology, physics etc. Because we want to finally apply Bayesian 
inference in physics we can restrict now, without loose of generality, to the physical theories. 
So let's imagine some unknown physical phenomenon and let's say we posses K theories 
{Hi, . . . , Hk} that can potentially describe it. All these theories differ one another. The in- 
formation about investigated phenomenon is contained in the collection of the experimental 
data D. Both theories and experimental data we can consider as the elements of the same 
set space Q so {Hi, . . . , H^, D} G Q. The set Q together with measure P and a— algebra 
T build probability space (fl,J-, P). In such a well defined theory of probability, natural 
concept of conditional probability occurs. So the probability of given theory H iy when we 
have data D, is defined as 

PWD) = (1) 

This probability tell us which theory describes experimental data D better and is called 
posterior probability. On the other side we can ask about probability of outcomes D when 
theory Hi is true one 

This probability tells us about different predictions D from the theory Hi and is a marginal 
likelihood, commonly called evidence. Because P(Hi D D) — P(D D Hi) we can combine 
equations (CD) and (j2J) what give us 

mm = Mffl , (3) 

This equation is the famous Bayes theorem. The probability P{Hi) in this equation is called 
prior probability and it is in fact hard to define this number. It describes our initial beliefs 
about given theory. It is an human factor to choose this number and can be non objective. 
Sometimes one theory is remarkably more probable than another because of for example 
mathematical beauty. The other factor is that some theory can well describe variety of 
others similar phenomena. However, if we do not have strong motivation to introduce some 
initial selection of the theories, then the most natural choice is to assume a homogeneous 



4 



distribution of the prior probability 



pirn = r 



(4) 



Then none of theories is favored. The probability P(D) is no interesting normalization 
constant, what we can calculate thanks to the normalization condition 



K 



Y,P{Hi\D) = l, 



(5) 



i=i 

which together with the Bayes theorem give us 

K 



P{D) = Y,P{D\H i )P{H i ). 



(6) 



»=i 



Each of the theories {Hi, . . . , H^} contains some numbers of parameters described by the 
vector 9i for a particular theory. The simple theories (generally not simple mathematically) 
contain a small number of parameters and what leads to the limited predictions. The more 
complicated theories or maybe better these without mathematical beauty (but not generally) 
contain large amount of parameters. Effective theories belong to this type of theories. 




FIG. 1: Evidence for easy and complicated theories 



So the values parameters are also elements of the set space Q if there are not fixed. 
Previously parameters were fixed in the definitions of theories. Generally, when parameters 
are not defined, the evidence is marginal probability integrated over the allowed values of 
the parameters of the model 

(7) 



P(D\Hi) = J dOiP(D\0i,Hi)P(Oi\Hi). 

Now we can back to the Bayes theorem and explain the idea of Bayesian inference. 
Considering Bayes theorems for two model i and j and dividing equations (j3J) for them by 



5 



sides we obtain 

_ P(Hj\D) _ P(Hj) P(D\Hj) 
V -P(H S \D) P{Hj) PiDlH,)- U 
The introduced Bij is called the Bayes factor. 

So the main problem is now to calculate evidence. The direct calculation is generally 

impossible. The our goal is to use Monte Carlo methods to do it. We firstly introduce the 

parameter A and redefine evidence to the form 

P(D\H l ) x = J dnP^Dle^Hi) (9) 

where 

dit i = de i p{e i \H i ). (io) 

The P(D\9i, Hi) is in fact likelihood and we can call it for simplicity L. So 

dlogP(D\Hi) x _ f dmL x log L 



dX J dn,iL x 

and 

P(D\Hi) =exp 



(logL)A (11) 



f 1 Jx dlogP(D\Hj) x ] f 1 

J dX =GXP J dX " ogL / x - ( 12 ) 

Now we can show connection between our approach and thermodynamics. Introducing 

E = -\ogL (13) 
1/T = A (14) 

Z = J dnL x (15) 

we obtain 

Z = J dTte~ E ' T (16) 
and equation (ITTj) takes a known form 

f d-KEe- E / T 

This is energy of the system in the temperature T. When we calculate it for different 
temperatures we can directly evaluate integral in the expression (j!2j) and finally evidence. 
As we see, to perform Bayesian inference we must calculate the thermodynamical integral 
(ITT]) . These kind of integrals can be solved analytically only in case of very simple systems. 
Generally numerical methods are proper to solve these kind of problems. These methods 
are known as Monte Carlo 



6 



x y Ay 

1 1 7 

2 3 3 

3 5 4 

4 7 6 

5 10 3 

6 15 1 



III. SIMPLE EXAMPLE 

Now we have all theoretical equipment to solve some practical problem. We consider 
very simple kind of theories and small sample of data-points to make computer computation 
short. It is fact rather more pedagogical not practical problem. In this example we want to 
show how to perform Bayesian inference in the simplest case. However the generalizations 
to the more advanced problems are straightforward. In the next section we will mention 
how to apply Bayesian methods to more complicated problems. 

Let consider some imagined experiment in which we perform measurements of some 
physical variable y for six different values of parameter x. In the experiment we also measure 
error-bars of the outcomes y. In fact we can repeat measurements many times and then y 
is a mean value and error-bars are calculated as dispersion of outcomes for particular value 
of the parameter x. These data points we present in table IIII1 

The phenomenon which we instigate is still not understood, but we posses three alterna- 
tive models to describe them. We list these models below 

model l:y(x) = a + a±x, (18) 
model 2:y(x) = a + a.\X + a 2 x 2 , (19) 
model 3:y(x) = a + a 2 x 2 . (20) 

All models are described by simple polynomial functions. Models 1 and 3 look more simple 
because each is described by two parameters when model 2 contain three unknown param- 
eters. 

The first step of Bayesian inference is to fit these models to experimental data. We can 



7 



use for example method of least squares. We obtain 

model ha = -2.5 ± 1.1, a x = 2.7 ± 0.3, 

model 2:a = 0.1 ± 1.0, oc x = 0.28 ± 0.67, a 2 = 0.34 ± 0.9, 

model 3:a = 1.1 ± 0.4, a 2 = 0.38 ± 0.02. 



(21) 
(22) 
(23) 



Error-bars of the parameters give us intervals necessary for Bayesian inference. We must 
perform integration (look for the solution) in the finite parameter space. We assume that 
outcomes from the Gaussian distribution and the likelihood function has then a form 



L oc exp 



1 N 



[y(xi) - yif 



(24) 



Now we apply theory described in the previous section. With use of the Metropolis 
algorithm we calculate energies {E)\ for models considered. We perform calculations for the 
values of A from the range (0, 1) as it is necessary to calculate the integral in equation ffT2l . 
We show these results in Fig. El [3], HI 



0.2 0.4 0.6 0.8 1 



FIG. 2: {E)\ dependence for the first model. 



Now with use of this data we can perform integrals in the form 



l 

h = I d\(E) x . 
'o 



From equation (fl2l we see that 

P(D\H l )=exp[ d\{\ogL) x = e- h 
Jo 

The obtained values for three models considered are 



(25) 



(26) 



h = 5-8, I 2 = 7.6, h = 4.* 



(27) 



8 



0.2 0.4 0.6 0.8 1 



FIG. 3: (E)\ dependence for the second model. 



mmmm 



0-2 0.4 



0.8 1 



FIG. 4: (E)\ dependence for the third model. 



Now we can directly calculate Bayes factors 



B 



12 



B 



23 



B 



13 



6.06, 



0.06, 



0.36. 



(28) 
(29) 
(30) 



We can also directly calculate posterior probabilities 

P{H X \D) = 0.26, 
P{H 2 \D) = 0.04, 
P(H 3 \D) = 0.70. 



(31) 
(32) 
(33) 



What we see is that the last model explains the experimental data in the best way. Even 
that it may look more complicated than linear model number one. The second model can be 
definitely discarded. This model contains remaining models inside and naturally fits better 



9 



to the experimental data because has more degrees of freedom. But comparing to the other 
models he is too complicated and not necessarily properly explains experimental data. We 
see that however first and last model posses the same number of degrees of freedom the last 
function explains in the better way the nature of the investigated physical process. 



IV. APPLICATIONS IN COSMOLOGY 



In this section we would like to present some applications of Bayesian inference in the 
modern cosmology. It is not an accident that we decided to talk about this particular 
branch of science. The reason is that the cosmology really needs these methods, because our 
knowledge about the Universe is still very limited. We have a lot of theories about different 
stages of the Universe but only a poor number of observations to verify them. So it is the 
right place for the Bayesian methods. 



A. Dark energy 

The first case that we would like to talk about is connected with very mysterious behav- 
ior of the Universe, discovered in the end of the last decade. Namely, the observations of 
the distant SNta lndlC ate d that the Ueiverse expae.ee accelerate, g Te ls d ls ccve ry was 
apparently in the conflict with other observations and believes that Universe in filled with 
normal matter like dust, stars, planets etc. In such an assumption the Universe always decel- 
erates. So what is happen with the Universe? Why it started to accelerate after a previous 
phase of deceleration? What is the mysterious component of the Universe which we call dark 
energy? There is presently enormous possible answers for this question. The presence of 
the cosmological constant A, phantom fluid, modified gravity, field of quintessence, quantum 
gravitational effects, brane world models, vacuum energy and so on and so on. The number 
of possible solutions is really impressive. So, which one of them is the real solution? Which 
one describes the Nature in the right way? Or maybe none of them, maybe we still must be 

dng 



looking for new models. It of course can be true. But let stop for a moment in the loot 
for new and new models to check, that maybe we already know the correct answer 
Or more precisely, which of the models that we have cope with observational data in the 
best way and is also the simplest one? This is precisely what we can do thanks to Bayesian 



10 



inference. 



B. Can we distinguish quantum gravitational effects from observations? 

This intriguing question is corresponding to the presence of possible observational effects 
of quantum gravity. The quantum gravitational effects are predicted to be very small and 
unreachable to present and any future generations of accelerators. However the quantum 
gravitational effect can survive as a fossils from the very early Universe in which quantum 
gravitational effects had been dominant. These effects can give influence to the spectrum 
of perturbations of inflation field 

a a. 

These primordial fluctuations then lead to the 
fluctuations of matter and finally to the structures formation in the Universe. So can we 
deduct some information about quantum gravity from observations of microwave background 
radiation and large scale structures? For the first sight it can sound strange because the 
quantum gravity describe microscopic property of gravitational field at very deep level. 
However the same effects were very important in early universe and can influence for his 
global property. Is here a place for Bayesian inference? The answer is yes. We have now a 
lot of predictions from quantum theories of gravity like Loop Quantum Gravity [8| and still 
very limited number of observations in the region where they can be important. 

C. Dark matter etc. 

The two examples presented in this section are very important but not only they. There 
is a lot of other places in the cosmology when Bayesian inference is and should be applied. 
For example we still don't know what is dark matter (the second dominant component of 
the Universe). It can be an axion, higisno, gluino other super particles 9| or just neutrinos 
etc. It is ideal place for Bayesian inference to point out the best candidate. But we still 
know to less from observations. 

There is huge number of cases where we can apply Bayesian inference in cosmology. As 
we mentioned it many times it is a perfect place to use these methods. We have residual 
observations and a lot of theories. Some of them are easy, some are pure models, some are 
brilliant new ideas. This is like with our example with door bell. We hear the bell and we 
must predict who is out door. Without any information we cannot predict who is ringing, 



11 



because the sound of the bell is always the same. However some peoples ringing only once 
and some of them more. Therefore we must to listen very carefully when something is 
ringing in the cosmology. 

V. WHO IS A BAYESIAN? - SOME MORE GENERAL EPISTEMIC 

REMARKS 

It is an obvious fact that Bayes' theorem is a theorem. Nevertheless, it should be explicitly 
explained, that to be a Bayesian is more than to know and use the theorem. A great success 
of Bayesian theory of confirmation in such areas of human activity as physics, biology, 
medicine, cognitive science (decision theory) on the one hand, encounters serious limitations 
of the analyzed method on the other hand. A radical Bayesian would probably say that 
it can always be applied: we always have to do with a joint distribution and a numerical 
representation of belief is always possible. However, a Bayesian epistemology could be 
treated not only as competing with other methods of confirmation, but as a way of making a 
specific (not exclusive) model of beliefs and related evidences, which can be easily elaborated 
and understood. 

In the Bayesian approach probability is attributed to hypotheses which are being con- 
firmed. This confirmation can be interpreted both qualitatively and quantitatively, since 
inference is based on empirical data and there are relations between hypotheses, theories 
and observations to be explicated. It is indeed a crucial point in understanding Bayesian 
inference - the meaning which is ascribed to probability. Using probabilistic methods one 
can measure two things: how often a specific event occurs and how strong the evidence 
(confirming our beliefs) is. 

Let us generally state that Bayesianism can be treated as an epistemic theory which 
examine the relation between beliefs and empirical evidences as to measure the strength of 
the beliefs. As it has be shown the most important concept used to gain that goal is the 
notion of conditional probability [10]. Using Bayesian inference we not only measure the 
strength of beliefs but also propose the method for rational estimating a change of the beliefs 
under the influence of a new evidence. 

Sometimes, among scientists and philosophers of science, there is a bit of hesitation about 
exclusiveness of such an approach. S. Okasha in his study on van Frassen's conception of 



12 



induction wrote 




He accepts the Bayesian representation of opinion in terms of degrees-of- 
belief, and he agrees that synchronic probabilistic coherence is a necessary con- 
dition of rationality. However, he does not accept the Bayesian thesis that 
conditionalization is the only rational way to respond to new evidence; though 
he allows that it is a rational way. 



It can be said in that sense that Bayesianism offers a solution to old problems of induction. 
We have got an approximately coherent and reasonable model for probability corrections. Of 
course, it is possible if having initial probability and evidence (priors). The proposed solution 
has its weakness: its method often tells nothing about how estimate this probabilities. 

There are two groups among Bayesians who differ with each other with respect to criteria 
used in choo sing of priors: objective Bayesians: E. T. Jaynes 12|, H. Jeffreys 13} , R. D. 
'. -losenkrantz 14j) and subjective Bayesians: B. De Finetti 15], C. Howson 16] i P. Urbach 



It is not only the problem (or problems) of induction, that Bayesian inference tries to 
deal with, but also the problem of finding a justification of induction inference itself, which 
can be explicated in several schemas: 

• Inductive Generalization 

Nobody denies that a finite number of experimental data cannot deliver an exhaustive 
proof to a universal statement but, according to induction, empirical evidence confirms 

nn 

generalization |17l. Il8l|. An example of that is enumerative induction which principle 
explicates, as follow: 

Some crows are black. 
Therefore, all crows are black. 



Every load added so far has not damaged this truck. 
Therefore, the next piece of load will not damage this truck. 



Hypothetical Induction 

It occurs when some hypothesis deductively entails the evidence. 



13 



An evidence confirms hypothesis, if the evidence is a logical consequence of 
the hypothesis. 

In the case of existing multiple competing hypotheses one can try to show that the 
falsity of the hypothesis entails the falsity of the evidence or use additional criteria of 
hypotheses' selection, like simplicity or inference to the best explanation. However, 
these proposals supply markets of hypotheses (for example cosmological models) with 
rules for successful selection but in fact they cannot give any rational explanation 
to the evidence. It may be a truism, but the difference between explanation and 
confirmation should be treated with special care. The more so because there is not 
a unification among Bayesians as for representation of the degree to which evidence 
supports an hypothesis. The most popular are three options: 

— a difference measure: P(H\E) — P(H); 

— a normalized difference measure: P(H\E) — P(H\->E); 

— a likelihood measure: ^i^pfH\E)\p^H) 

The inductive generalization, which has the simple pattern: extrapolation from particu- 
lar data do general conclusions, suffers several problems called paradoxes of confirmation. 
Goodman's paradox, know in the literature as problem of "grue" , is particularly interesting 
[ji3]. Especially the question of its counterpart in the field of cosmology. In a traditional 
version: 

• We have two hypotheses: (1) all emeralds are green and (2) all emeralds are grue 
(green if examined until some time t x and blue otherwise). 

• Evidence: found emerald is green confirms both: (1) and (2). 

Any satisfying resolutions to the paradox propose additional assumptions; for example point- 
ing out on "green" as a natural kind term instead of "grue" . 

In a search for possible cosmological version of the paradox we can for example try to 
compare two related models CDM (Cold Dark Matter) cosmological model and LCDM 
(Lambda Cold Dark Matter) cosmological model with the positive cosmological constant 
term which seems to be the simplest candidate for dark energy description. The Bayesian 
method of confirmation dedicated to select between these two models reveals a quite opposite 



14 



verdict while used in the 90s and currently. However, in our opinion it is misunderstanding 
to treat this study CcLSC clS db paradox in Goodman's sense. It becomes obvious, because when 
new observational data confirm better in the same time the LCDM model in comparison with 
the CDM model, the latter simply disappears out the stage. The paradox of confirmation 
would occur when related to a certain family of models there will be the same degree of 
confirmation (the same time and evidence) assigned to hypotheses differing from each other 
for example with regard to foreseeable future scenarios of Universe evolution. 

It is often said that a scientific theoretical research means achieving two specific goals: (1) 
finding a model which approximates a phenomenon best and (2) constructing a hypothesis 
that offers best prediction. It is a good example to show how in this context two criteria of 
model selection are being c omp ared: the Akaike information criterion (AIC) and Bayesian 
information criterion (BIC) [20]. Although these model comparison methods are put together 
as competitors, they in fact try to ask different questions [211 ] . The AIC estimates predictive 
power of an elaborated hypothesis, while the BIC - goodness-of-fitting [22 L M. Forster and 
E. Sober have explained this nuance with respect to the fitting problem [23|, pp. 5-9]: 

Even though a hypothesis with more adjustable parameters would fit the 
data better, scientists seem to be willing to sacrifice goodness-of-fit if there is a 
compensating gain in simplicity. (...) 

Since we assume that observation is subject to error, it is overwhelmingly prob- 
able that the data we obtain will not fall exactly on that true curve. (...) Since 
the data points do not fall exactly on the true curve, such a best-fitting curve 
will be false. If we think of the true curve as the 'signal' and the deviation 
from the true curve generated by errors of observation as 'noise', then fitting the 
data perfectly involves confusing the noise with the signal. It is overwhelmingly 
probable that any curve that fits the data perfectly is false. 

The general comments of this section can be summed up by a statement that Bayesian 



inference is a method dedicated to specific goals in scientific practice [241 ] . With respect to 
cosmology, the mentioned LCDM-CDM models comparison reveals in Bayesian inference 
context another problem. It strictly concerns currently changing concept of the model in 
physics {25)]. At present there is a special emphasis placed on effectiveness and mediating 
function of the models in physics. This conception of (mediating models) was studied for 



15 



example by Morrison. This status of scientific models is determined by the way they are 
designed: they are not simply derived from the underlying theory, nor fixed by the evidence 
only. Their "nature" is determined by a mediating role (between a theory and phenomena). 
Morrison states, as follows [26j, p. 67]: 

Although they are designed for a specific purpose these models have an au- 
tonomous role to play in supplying information, information that goes beyond 
what we are able to derive from the data/theory combination alone. 

In cosmology built on general relativity, the Einstein equations solutions can be treated 
as the models of the Universe. A construction of a model proceeds due to assuming specific 
idealizations and ceteris paribus conditions. It means in a practice that we reduce a number 
of problems which the general theory implicates. Assuming idealizations is to pass over 
specific things that affect a phenomenon systematically; ceteris paribus - respectively means 
the occasional influence. It is said that those formulations of scientific laws are certain 
approximations of the investigated phenomena. There has been recently quite an important 
and interesting discussion about validity of application the Bayesian inference to idealization 



itself 



281 ] . The problem concerns idealized hypotheses and a question of assignment 
probability to them, since they can be treated as counterfactuals. What is a posterior 
probability of the ideal gas law or the law of motion for simple pendulum? Jones showed 
that solution lies exactly in the understanding of the procedure of elaborating a model. 
If the model idealizations we treat not as a result of abstraction but as a distortion, the 



methodological consequences can be excluding for Bayesian inference, indeed 



Given that most scientific hypotheses are idealized in some way, Bayesianism 
seems to entail that most scientific hypotheses cannot be confirmed. 
Bayesians thus confront an apparent trilemma: either develop a coherent pro- 
posal for to assign prior probabilities to counterfactuals; or embrace the coun- 
terintuitive result that idealized hypotheses cannot be confirmed; or reject 
Bayesianism. 

The general Bayesian conception of empirical evidence can be put into three main state- 
ments/ consequences: 

• Less probable evidence delivers best confirmation to hypothesis; 



16 



• Evidence confirms better those hypothesis in context of which it is more probable. 

• If the hypothesis' probability is very little, it can be confirmed only by very strong 
evidence. 

The crucial point in Bayesian inference lies in the fact it is able to deal with the prob- 
lem, only if manage the input of some probabilities. It makes sense, since we never start 
reasoning with absolute no knowledge. Results we achieve - P(H\E) - are always "con- 
ditional": it reveals a property of H which is not objective, but related to E and certain 
knowledge, called background knowledge. More subtle epistemic analysis can be carried out 
with reference to the types of background knowledge. P. Wang studied the problem of un- 
derlying knowledge and discerned two types of conditions which influence the evaluation of 



probability distribution function 
• explicit conditions 



29 . pp. 98-99] in the formulas, as follows: 



P(E\K ) 

— E is a binary proposition, 

— belongs to proposition space (then we can define Pq(E)), 

— Po(E) > 0; 

• implicit conditions 

— non-binary propositions allowed, 

— there may be statements outside the proposition space, 

— "Even if a proposition is assigned to a prior probability of zero according to one 
knowledge source, it is possible for the proposition to be assigned a non-zero 
probability according to another knowledge source" . 

All these discernments are not trivial, since we have to answer the question: if all the back- 
ground knowledge can be probabilistic-valued?. This is one of the most important epistemic 
questions of Bayesian Theory of Confirmation, just beside the others: 



17 



• Are there degrees of belief? The answer maybe lies in an attempt to distinguish 
'rational' degrees of belief from belief in general. Are the corrections in probability, 
gained in the Bayesian procedure, just new probabilistic information or they deliver 
new reason to believe that the proposition considered is true? 

• When are we actually updating our belief and when there is just a revision of proba- 
bility (known problem of old evidence)? 

VI. SUMMARY AND OUTLOOK 

In this project we have introduced basics of Bayesian inference and shown how to use it 
in practice. We have presented some mathematical background and formulate problem in 
the similarity with thermodynamics. Then the known Monte Carlo methods and Metropolis 
algorithm were used to select simple models. Finally we have shown possible applications 
of the Bayesian inference in the modern cosmology. 

We have also studied epistemological aspects of Bayesian Confirmation Theory in the 
context of problems of modern cosmology where the Bayesian approach offers not only 
estimation of model parameter from the observational data but also methods of comparison 
of the models (selection). We have demonstrated that the Bayesian inference bases on some 
assumptions of philosophical character. 

APPENDIX A: MARKOV CHAINS IN THE PARAMETER SPACE 

In this appendix we visualize Markov chains in the parameter space for the models con- 
sidered. We assume values of the parameter A = 0.1,1,10,100. Where 1/A = T, so the 
high value of A corresponds to low temperatures. We investigate here a broad range in A, 
however for the calculations of the evidence only values of A G [0, 1] are important. 

APPENDIX B: VISUAL PROOF OF ERGODICITY 

The very important question related to Monte Carlo simulations is the ergodicity of the 
algorithm. It means that in the finite number of steps (finite time) the system must be freely 
close to the any point in the phase space. In the Monte Carlo simulations it causes that we 



18 




FIG. 5: Top left : A = 0.1. Top right : A = 1. Bottom left : A = 10. Bottom right : 

A = 100. 

can always find a proper energy minima, even for very low temperatures when fluctuations 
are small. To check it we performed Markov chains in the low temperature system. In such 
a system, when algorithm is not ergodic, a Markov chain cannot always lead to the proper 
minima. In Fig. [8] we show a Markov chain in the parameters space for the third model 
considered in the text. We show that starting from the different points in the parameter 
space system always go to the same place where is proper minima. This is a visual proof of 
ergodicity for a kind of function considered. It is possible that it is not a truth for the more 
complicated kind of functions. 



[1] J. L. Rodriguez-Fernandez, Endeavour 23 3, 121 (1999). 

[2] D. J. C. Mackay, Information Theory, Inference and Learning Algorithms (Cambridge Uni- 
versity Press, Cambridge, 2002). 
[3] A. G. Riess et al. (Supernova Search Team), Astron. J. 116, 1009 (1998), astro-ph/9805201. 
[4] M. Szydlowski, A. Kurek, and A. Krawiec, Phys. Lett. B642, 171 (2006), astro-ph/0604327. 



19 



100 



20 - 



80 



60 - 



40 -" 




1 00 



100 r 



100 r 



80 



80 



60 - 




60 - 



40 - 



20 - 



1 00 



FIG. 6: Top left : A = 0.1. Top right : A = 1. Bottom left : A = 10. Bottom right : 



[5] A. Kurek and M. Szydlowski (2007), astro-ph/0702484. 

[6] U. H. Danielsson, Phys. Rev. D66, 023511 (2002), hep-th/0203198. 

[7] J. Mielczarek, JCAP 0811, 011 (2008), 0807.0712. 

[8] A. Ashtekar and J. Lewandowski, Class. Quantum Grav. 21, R53 (2004), gr-qc/0404018. 

[9] G. Jungman, M. Kamionkowski, and K. Griest, Phys. Rept. 267, 195 (1996), hep-ph/9506380. 
[10] M. Strevens, The Bayesian Approach to the Philosophy of Science (Macmillan Encyclopedia 

of Philosophy, 2006), 2nd ed. 
[11] S. Okasha, Stud. Hist. Phil. Sci. 31 4, 691 (2000). 

[12] E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, Cam- 
bridge, 2003). 

[13] H. Jeffreys, Theory of Probability (Oxford University Press, Oxford, 1961), 3rd ed. 
[14] R. D. Rosenkrantz, Inference, Method and Decision: Towards a Bayesian Philosophy of Sci- 
ence (D. Reidel, Dordrecht, 1977). 
[15] B. De Finetti, Theory of Probability (John Wiley and Sons Ltd, London, 1974). 



A = 100. 



20 




FIG. 7: Top left : A = 0.1. Top right : A = 1. Bottom left : A = 10. Bottom right : 

A = 100. 

[16] C. Howson and P. Urbach, Scientific Reasoning: the Bayesian Approach (Open Court, La 
Salle, IL, 1989). 

[17] R. Carnap, Logical Foundations of Probability (Chicago University Press, Chicago, 1950). 

[18] H. Reichenbach, The Theory of Probability (University of California Press, Berkeley CA, 1949). 

[19] N. Goodman, Fact, Fiction and Forecast (University Press, Harvard, 1955). 

[20] A. R. Liddle et al. (2007), astro-ph/0703285. 

[21] M. Szydlowski and A. Kurek (2008), 0801.0638. 

[22] E. Sober, Proceedings of the British Academy 113, 21 (2002). 

[23] M. Forster and E. Sober, Brit. J. Phil. Sci. 45, 1 (1994). 

[24] E. V. Linder and R. Miquel (2007), astro-ph/0702542. 

[25] M. Morrison, Pozna Studies in the Philosophy of the Sciences and the Humanities 86, 145 
(2006). 

[26] M. Morrison, Philosophia Naturalis 54, 65 (1998). 
[27] M. Shaffer, Phil. Sci. 68, 36 (2001). 




[28] N. Jones, American Philosophical Association Central Division Meeting, Chicago (2007), 

http:/ /philsci-archive. pitt.edu/archive/00003101. 
[29] P. Wang, Artificial Intelligence 158, 97 (2004). 



